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© Method and apparatus for computing the self-motion of moving imaging devices. 

© Determining the self-motion in space of an imaging device (e.g., a television camera) by analyzing image 
siuenceT tf£inrt trough the device. Three-dimensional self-motion is expressed as a comb.nat.on of 
otTons VflTiou the horizontal and vertica. camera axes and the direction of camera trans.at.on The 
Svention computes *e rotational and trans.at.onal components of the cameras self-motion exclus.vely from 
STS^ZT Robust performance is achieved by determining the direction of heading (i.e.. the focus of 
ZJjrrToo^ r^ion instead of a single .ocation on the image piano. The method can be used to 
SJwto ZTeclon of heading is outside the cunent field of view and when there is zero or very small 
camera translation. 
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METHOD AND APPARATUS FOR COMPUTING THE SELF-MOTION OF MOVING IMAGING DEVICES 

Field of the Invention 

The present invention pertains to a method and apparatus for the determination of three-dimensional 
self-motion of moving imaging devices according to the preamble of the independent claims. 

Background of the Invention 

Visual information is an indispensable clue for the successful operation of an autonomous land vehicle. 
Even with the use of sophisticated inertia! navigation systems, the accumulation of position error requires 
periodic corrections. Operation in unknown environments or mission tasks involving search, rescue, or 
manipulation critically depend upon visual feedback. 

Assessment of scene dynamics becomes vital when moving objects may be encountered, e.g.. when 
the autonomous land vehicle follows a convoy, approaches other vehicles, or has to detect moving threats. 
For the given case of a moving camera, such as one mounted on the autonomous land vehicle, image 
motion can supply important information about the spatial layout of the environment ("motion stereo") and 
the actual movements of the land vehicle. 

Previous work in motion anaJysis has mainly concentrated on numerical approaches for the recovery of 
three-dimentional (3-D) motion and scene structure from two-dimensional (2-D) image sequences, the most 
common approach is to estimate 3-D structure and motion in one computational step by solving a system 
of linear or non-linear equations. This technique is characterized by several severe limitations. First it is 
known for its notorious noise-sensitivity. To overcome this problem, some researchers have extended this 
technique to cover multiple frames. Secondly, it is designed to analyze the relative motion and 3-D structure 
of a single rigid object. To estimate the egomotion of an autonomous land vehicle (ALV). having the imaging 
device or camera, and the accompanying scene structure, the environment would have to be treated as a 
large rigid object However, rigidness of the environment cannot be guaranteed due to the possible 
presence of moving objects in the scene. The consequence of accidentally including a moving 3-D point 
into the system of equations, representing the imaged environment, in the best case, would be a solution (in 
terms of motion and structure) exhibiting a large residual error, indicating some non-rigid behavior. The 
point in motion, however, could not be immediately identified from this solution alone. In the worst case (for 
some forms of motion), the system may converge towards a rigid solution (with small error) in spite of the 
actual movement in the point set. This again shows another (third) limitation: there is no suitable means of 
expressing the ambiguity and uncertainty inherent to dynamic scene analysis. 

It is therefore the object of the present invention to devise a method and an apparatus with which the 
self-motion of an imaging device can be unambiguously evaluated. This object is achieved by the 
characterizing features of the independent claims. Advantageous embodiments of the inventive method and 
apparatus may be taken from the dependent claims. 

The invention, that solves the aforementioned problems, is novel in two important aspects. The scene 
structure is not treated as a mere by-product of the motion computation but as a valuable means to 
overcome some of the ambiguities of dynamic scene analysis. The key idea is to use the description of the 
scene's 3-D structure as a link between motion analysis and other processes that deal with . spatial 
perception,- such as shape-from-occlusion, stereo, spatial reasoning, etc. A 3-D interpretation of a moving 
scene can only be correct if it is acceptable by all the processes involved. 

Secondly, numerical techniques are largely replaced by a qualitative strategy of reasoning and 
modeling. Basically, instead of having a system of equations approaching a single rigid (but possibly 
incorrect) numerical solution, multiple qualitative interpretations of the scene are maintained. All the 
presently existing interpretations are kept consistent with the observations made in the past The main 
advantage of this approach of the present invention is that a new interpretation can be supplied immediately 
when the currently favored interpretation turns out to be unplausible. 

The problem of determining the motion parameters of a moving camera relative to its environment from 
a sequence of images is important for applications for computer vision in mobile robots. Short-term control, 
such as steering and braking, navigation, and obstacle detection/avoidance are all tasks that can effectively 
utilize this information. 
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Summary of the Invention 



The present invention deals with the computation of sensor platform motion from a set of displacement 
5 vectors obtained from consecutive pairs of images. It is directed for application to autonomous robots and 
land vehicles. The effects of camera rotation and translation upon the observed image are overcome. The 
new concept of "fuzzy- focus of expansion (FOE), which marks the direction of vehicle heading (and 
provides sensor rotation), is exploited. It is shown that a robust performance for FOE location can be 
achieved by computing a 2-0 region of possible FOE-locations (termed "Fuzzy FOE") instead of looking for 
;o a single-point FOE. The shape of this FOE is an explicit indicator of the accuracy of the result. Given the 
fuzzy FOE. a number of very effective inferences about the 3-0 scene structure and motion are possible 
and the fuzzy FOE can be employed as a practical tool in dynamic scene analysis. The results are realized 
in real motion sequences. 

The problem of understanding scene dynamics "is to find consistent and plausible 3-D interpretations for 

is any change observed in the 2-D image sequence. Oue to the motion of the autonomous land vehicle (ALV). 
containing the scene sensing device, stationary objects in the scene generally do not appear stationary in 
the image, whereas moving objects are not necessarily seen in motion. The three main tasks of the present 
approach for target motion detection and tracking are: 1) to estimate the vehicle's motion; 2) to derive the 3- 
D structure of trie stationary environment; and 3) to detect and classify the motion of individual targets in 

20 the scene. These three tasks are interdependent. The direction of heading (i.e.. translation) and rotation of 
the vehicle are estimated with respect to stationary locations in the scene. The focus of expansion (FOE) is 
not determined as a particular image location, but as a region of possible FOE-locations called the fuzzy 
FOE. We present a qualitative strategy of reasoning and modeling for the perception of 3-D space from 
motion information. Instead of refining a single quantitative description of the observed environment over 

25 time, multiple qualitative interpretations are maintained simultaneously. This offers superior robustness and 
flexibility over traditional numerical techniques which are often ill-conditioned and noise-sensitive. A rule- 
based implementation of this approach is discussed and results on real ALV imagery are presented. 

The system of the present invention tracks stationary parts of the visual environment in the image 
plane, using corner points, contour segments, region boundaries and other two-dimensional tokens as 

30 references. This results in a set of 2-D displacement vectors for the selected tokens for each consecutive 
pair of camera images. The self-motion of the camera is modeled as two separate rotations about horizontal 
and the vertical axes passing through the lens center and a translation in 3-D space. If the camera performs 
pure translation along a straight line in 3-D space, then (theoretically) all the displacement vectors extend 
through one particular location in the image plane, called the focus of expansion (FOE) under forward 

25 translation or focus of contraction (FOC) under backward translation. The 3-D vector passing through the . 
lens center and the FOE (on the image plane) corresponds to the direction of camera translation in 3-D 
space. 

The invention can provide the directions of instantaneous heading of the vehicle (FOE) within 1 and 
self motions of moving imaging devices can be accurately obtained. This includes rotations of £5* or larger 
40 in horizontal and vertical directions. To cope with the problems of noise and errors in the displacement field, 
a region of possible FOE-locations (i.e., the fuzzy FOE) is determined instead of a single FOE. 

In practice, however, imaging noise, spatial discretization errors, etc., make it impractical to determine 
the FOE as an infinitesimal image location. Consequently, a central strategy of our method is to compute a 
region of possible FOE-locations (instead of a single position) which produces more robust and reliable 
45 results than previous approaches. 



Brief Description of the Drawings 

• so 

Figure 1 is a functional block diagram of the present invention. 

Figure 2 is a diagram showing an extended application of the invention to three-dimensional scene 
constructing. 

Figure 3 shows a camera model and corresponding coordinate system. 
55 Rgure 4 illustrates a successive application of horizontal and vertical rotation of the camera. 

Rgure 5 diagrams the effect of a focus of expansion location (FOE) for pure camera translation. 
Rgure 6 illustrates the concept of the FOE for discrete time steps during the motion of a vehicle 
having the camera. 

3 
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Figure 7 shows the amount of expansion from the FOE for discrete time steps. 

Figures 8a and 8b display a displacement field caused by horizontal and vertical rotation and 
translation of the camera and a derotated displacement field, respectively. 

Figures 9a and 9b show an image plane and a rotation space, respectively. 

Figures 10a and 10b show mappings of a polygon from rotation space to an image plane and vice 
versa, respectively. 

Figures 1 1a-c illustrate a changing rotation polygon. 

Figure 12 illustrates an intersection of displacement vectors with a vertical line which, if moved, 
changes the variance of intersection. 

Figure 1 3 shows a displacement field used to evaluate various error functions. 
Figures 14a-d reveal the standard deviation of intersection at a vertical cross section at position x for 
different amounts of vertical rotation. 

Figures I5a-d reveal the standard deviation of intersection (square root) at a vertical cross section at 
position x for different amounts of vertical rotation with no horizontal rotation and with no pixel noise applied 
rs to the image locations. 

Figures 16a-d reveal the standard deviation of intersection (square root) at a vertical cross section at 
position x for different amounts of vertical rotation with no horizontal rotation and with t1 pixels of noise 
applied to the image locations. 

Figures 17a-d reveal the standard deviation of intersection (square root) at a vertical cross section at 
20 position x for different amounts of vertical rotation with no horizontal rotation and with z2 pixels of noise 
apoited to the image locations. 

Figures I8a-d show the location of minimum intersection standard deviation under varying horizontal 
rotation with the horizontal location of the FOE marked xf. 

Figures !9a-d show the amount of minimum intersection standard deviation under varying horizontal 
25 rotaticn and with no noise added to the image locations. 

Figures 20a-d show the amount of minimum intersection deviation under varying horizontal rotation 
and with z2 pixels noise added to the image locations. 

Figure 21 illustrates intersecting displacement vectors with two vertical lines which lie on the same 
side of the FOE. 

30 Figures 22a-d show the correlation coefficient for the intersection of displacement vectors at two 

vertical lines under varying horizontal rotations with no noise added. 

Figures 23a-d show the correlation coefficient for the intersection of displacement vectors at two 
vertical lines under varying horizontal rotations and with ±2 pixels of noise added to the image locations. 
Figure 24 illustrates displacement vectors and measurement of their error. 
35 Figure 25 shows how to determine the optimum two-dimensional shift for a set of displacement 

vectors. 

Figure 26 reveals how FOE locations are dismissed if the displacement field resulting from the 
application of the optimaJ shift results in a vector not pointing away from the FOE. 

Figures 27a-d illustrate the displacement field and minimum error at selected FOE locations. 
40 Figures 28a-e display the effects of increasing the average length of displacement vectors upon the 

shape of the error function. 

Figures 29a-e display the effects of increasing residual rotation in a horizontal direction upon the 
shape of the error function for relatively short vectors. 

Figures 30a-e display the effects of increasing residual rotation in a vertical direction upon the shape 
4$ of the error function for relatively short vectors. 

Figures 31a-e show the effects of increasing residual rotation in horizontal and vertical directions 
upon the shape of the error function for relatively short vectors. 

Figures 31f-j indicate the amount of optimal linear shift obtained under the same conditions in Figures 

31a-e. 

so Figures 32a-e show the effects of the uniform noise applied to image point coordinates for a. constant 

average vector length. ■ 

Figures 33a and 33b reveal the different effects of uniform noise applied to image point coordinates 
for shorter and longer average vector lengths, respectively. 

Figure 34 reveals a side view of a camera traveling parallel to a flat surface. 
55 Figures 35a-i show an original image sequence taken from a moving vehicle after edge detection and 

point detection with the selected points located at the lower-left comers of their marks. 

Figures 35j-p show the original image sequence after edge detection and point selection. 

Figures 36a-p illustrate the displacement vectors and estimates of vehicle motion for the image 

4 
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sequence shown in Figures 35a-p -^-mentation, of the embodiment of the invention. 

Figure 37 illustrates a specific hardware implement*^ 

Description of the Present Embodiment 
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, ^ M „ f the present invention. Figure 2 also expands on 3-D 

Figures 1 and 2 reveal the man porbons of he P 9^ ^ ^ ^ ^ 

interpretations of 2-D images ,n .terns 1 8 and 120. ^ P are extracted feature extraction ^ 

F.gure 1. First, agnrficant features , (po nts diS p| aceme n vectors are computed for this set of 

tracking 114 from the .mage date 112 and B»*0 d 9 be tween individual frames by 

features For the examples shown "JU in the related art. In me second step, the vehicle's 
^ m ^- A«tomat,c tech J *^3^ <«>E). and the amount of rotation in space are 
direction of franslation i.e the focus ^•"P"* 0 J oB computation is described below. Nearly all the 
determ.ned. The effects of veh.cle motion on the po £ wh a|so . described 

necessary numerical computation ,s performed ,n "^SueSfTILrn- 3-D mode, of the scene. Also 
below. The third step (a. 2-D change analys-s 118^ cor* Experiments wrth the present 

disclosed are the concepts and operation of the qualitative »uomo muuc h 

invention on reaJ imagery ^mSnfltTtracking 114. FOE seeker 124 and 

System 1.0 contains the Rowing three main for se|ecte(j jm tokens are 

optimal derotation 122- in Figure 1. The 2-0 disp vectors are caused by 

determined .n the hrst stage (token trjdtana " 4 ^ nC ° otior , inc| 9 uding ca P me ra rotations, they do not yet 
some arbitrary and (at this point) unknown camera 1 ^ * 
exhibit the characteristic radial pattern of pure camera trans a on. 

_ , . /cnc eMLftr 19A v spinets a set of candidate locations for the FOE and forms a 

The second component ( ^ s« iM)«tacB corre s P onding camera rotations, based 

connected image region of feasible FOE-locahons pl^ 9 particular FOE-location is feasible, if the 
on the results from the third component optimal derota"" « k« H ,m=,mir a iiv 
corresponding error value (computed by the optimal derotation module 122) is below some dynamica^y 
adjusted threshold. The size of the final FOE-region reflects the amount of uncertainty contamed ,n the 

visual information (.arge ^^^^% emines the optima. 3-D camera rotations for a 
The third component optimal derotation UC7 A „^ m rawQrCQ ramflr . 

particular (hypothesized) FOE-location. This is accomplished by simulatang the effects of averse camera 
rotations upon the given set of disp.acement vectors. The : camera .s v.rtua..y rrtat* ) unfcl ™drfied 
u K w y , ^artfirn with respect to the se ected FOE-location. Module 

displacement field is c osest to a radial expansion pattern w. u . w u.* ^ w 

122 returns the necessary amount of reverse rotation and the deviation from a radia. d.sp.acement fie.d (..e.. 

a " e c r rpone e nt 123. comprising FOE seeker 124 and optima! derotation 122. represents the fuzzy FOE 
means which outputs the region of possible FOE locations. ^tinnarv 
The first step of the present invention is to estimate the vehicle s mot.0" relate to ^ the stetionary 
environment using visual information. Arbitrary movement of an object in 3-D space and ttus the ^movement 
of the vehicle itself can be described as a combination of translat.cn and rotation. While knowledge about 
Se composite vehicle motion is essentia] for control purposes. on. y trans.at.on can supply .nformafcon about 
J TT . ^ * *u on /^««„r, c+or 0rt \ This, however, requires the removal of all image effects 
the soatia ayout of the 3-D scene (motion stereo), ^ . - ^ 

. U w ^pauai yuuiu v discuss the changes upon the image that are caused 

resulting from vehicle rotation. For this purpose, we a**"" " * * » 

by individual application of the "pure" motion components. 

uy uiui u . . _ v. . nf _ n object in space between two points in time can be 

ft is we -known that any ng d motion or an uujw r *~ 

wc n u ai. * y tandflHftB anrl rotation. While many researchers have used a velocity- 
decomposed into a combination of translation ana ru«» / 

L v « *k« ~~M*rrs fH^ foiinwino treatment views motion in discrete time steps, 
based formulation of the problem, the following treatm* _T o - « 

' . -«« r Hinate system, OCYZ) as illustrated in Rgure 3. Figure 3 

The viewing geometry involves a world coorain^ * * . , ^ 

!u ♦ L* ~ cwctam i-*n lens center 126, mage plane 128, and angles *, 6 and 

shows the camera-centered coordinate system 130, ^ ' , y K . . oe - . * . . fho 

0 of rotation. The origin O of coordinate system 130 is located I at lens center 126. Focal length f « the 

distance between .ens center 126 and image plane 128- E-* » port (W «J> '^^^^^ 

~ * , . , „ . 0 „ f camera rotation about the X, Y and Z axes, respectively. 

location (X.Y). Ang'es «. and * specrfy angles of T to a jnt j n 3.0 x =» ( X Y Z) T is 

Given the world coordinate system, a translation T - v w # hk k 
accomplished through vector addition: 
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A 3-0 rotation R about an arbitrary axis through the origin of coordinate system 1 30 can be described by 
successive rotations about its three axes: 
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R, R^ where 
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roUtioa about the X-axis, 



roUtioa about the Y-axis, 



roUtioa about the Z-axis. 



(2) 
(3a) 

(3b) 

(3c) 



25 A general rigid motion in space consisting of translation and rotation is described by the transformation 
M: X — X = R 0 RtjR^(T + X) (4) 
Its six degrees of freedom are U. V. W, o, \t*. 

This decomposition is not unique because the translation could be as well applied after the rotation. 
Also, since the multiplication of the rotation matrices is not commutative, a different order of rotations would 
' 30 result in different amounts of rotation for each axis. For a fixed order of application, however, this motion 
decomposition is unique. 

To model the movements of the vehicle, the camera is considered as being stationary and the 
environment as being moving as one single rigid object relative to the camera. The origin 0 of coordinate 
system 130 is located in the lens center 126 of the camera. 
3S The given task is to reconstruct the vehicle's or the camera's egomotion from visual information. It is 
therefore necessary to know the effects of different kings of vehicle or camera motion upon the camera 
image. Under perspective imaging, a point in space X = ( X Y Z ) T is projected onto a location on the 
image plane x = ( X Y ) T such that 

* = fz y = f r . (5) 

*o where f is the focal length of the camera (see Figure 3). 

The effects of pure camera rotation are accounted for. Ignoring the boundary efforts when the camera is 
rotated around its lens center 126, the acquired image changes but no new views of the environment are 
obtained. Pure camera rotation merely maps the image into itself. The most intuitive effect results from pure 
rotation about the Z-axis of the camera-centered coordinate system 130, which is also the opticaJ axis. Any 

*s point in the image moves along a circle centered at the image location x = f 0 0 ). In practice, however, the 
amount of rotation ^ of the vehicle about the Z-axis is small. Therefore, vehicle rotation is confined to the X- 
and Y-axis, where significant amounts of rotation occur. 

The vehicle or camera undergoing rotation about the X-axis by an angle - <t> and the Y-axis by an angle 
- 0 moves each 3-D point X to point x' relative to the camera. 

so 

X - X* - R^R/X (6) 
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lently x. the image point of X, moves to x' given by 



X cos* + Z tia* 



.X co%4 siuB ♦ Y i\nj + Z cot^ co** 



(7a) 



f 



X sM sia^ + Y cos^ - Z %lm<f> cot* (7b) 
-X cos^ sin* + Y iM ♦ £ c0 *4 co$ * 



I the perspective transformation for the original image point x yields, 
x and Y = ; Z y. (8) 

. rotation mapping rg r* which moves each image point x = ( x y ) into the corresponding image 
= ( x' y ) under camera rotation R* R e (i.e., a particular sequence of "pan" and "tilt" is given by 

*t* 9 (X) : X - X 

*+*B (x) : X * (X y) *' " (X 30 
^ x cos* + f si«* (9a) 

-x cos^ sia* ♦ y siM * f c0 *4 cos * 

x sia<* sia^ + y cos^ - f iia* cot 

; important to notice that this transformation contains no 3-D variables and is therefore a mapping of 
ge onto itself. This demonstrates that no additional information about the 3-D structure of the scene 
obtained under pure camera rotation. 

interesting property of this mapping should be mentioned at this point, which might not be obvious, 
an image point on a diagonal passing through the center of the image at 45 by only rotating the 
. does not result in equal amounts of rotation about the X- and the Y-axis. This is again a 
uence of the successive application of the two rotations R e and R*. since the first rotation about the 
also changes the orientation of the camera's X-axis in 3-D space. It also explains why the pair of 
ns in (7) is not symmetric with respect to 6 and 

measuring the amount of camera rotation, the problem to be solved is the following: -Given are. two 
locations x 0 and xi. which are the observations of the same 3-D point at time to and time t,. The 
•n here is the amount of rotation R$ and R e which applied to the camera between time to and time 
ild move image point x 0 onto xi assuming that no camera translation occurred at the same time. If 
i R Q are applied to the camera separately, the points in the image move along hyperbolic paths. If 
orizontal rotation were applied to the camera, a given image point x 0 would move on a path 
>ed by 

r 8 (*o): y 2 - y 2 o f2 ** 2 w 

-ly pure vertical camera rotation would move an image point xi along 

nee the 3-D rotation of the camera is modeled as being performed in two separate steps ( Re 
3d by ), the rotation mapping r* r e can also be separated into re followed by r$. In the first step. 
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applying pure (horizontal) rotation around the Y-axis r$, point xo is moved to an intermediate image location 

x c . The second step, applying pure (vertical) rotation around the X-axis r^. takes point x c to the final image 

location xi. This can be expressed as 

r 0 6 = <© f<&. where 

r^ xo = (x 3 yc) - x c = (x c y c ) 

r 0 . x c = (x c y c j - x, = (x- yO (12) 

Figure 4 reveals a successive application of horizontal and vertical rotation. Image point xo is to be 
moved to location x< by pure horizontal and vertical camera rotation. Horizontal rotation (about the Y-axis) is 
applied first, moving x 0 to x c , which is the intersection point of the two hyperbolic paths for horizontal and 
vertical rotation. In a second step. x c is taken to xi . Then the two rotation angles t) and s> are found directly. 
Further in Figure 4, the image point x c = ( x c y c ) is the intersection point of the hyperbola passing through 
xq resulting from horizontal camera rotation (10) with the hyperbola passing through xi resulting from 
vertical camera rotation (11). Intersecting the two hyperbolae gives the image point x c . with 
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f 2 + * 2 0 + y 2 o 



(f 2 ♦ x 2 0 ) (f 2 + y 2 !) - t 2 x y 2 0 



_ (f 2 + x 2 „) (f 2 + y 2 !> - x 2 ! y 2 0 J 
f 2 + x 2 ! ♦ y 2 j 



(13a) 



(13b) 



25 



The amount of camera rotation necessary to map x 0 onto x-. by applying Rg followed by R 0 is finally 
obtained as 



JO 



30 9 - tan' 1 J[c - taa" 1 jf0_ (14) 

f f 

i - Uu 1 ic. - U." 1 IL. < 15) 

35 f f • 

When the vehicle or camera undergoes pure translation between time t and time t'. every point on the 
vehicle is moved by the same 3-D vector T = ( U V W) T . Again, the same effect is achieved by keeping the 
camera fixed and moving every point Xj in the environment to X{' by applying -T. 

Since every stationary point in the environment undergoes the same translation relative to the camera, 
the imaginary lines between corresponding points X,Xj' are parallel in 3-D space. 

It is a fundamental result from perspective geometry that the images of parallel lines pass through a 
single point in the image plane called a "vanishing point" When the camera moves along a straight line, 
every (stationary) image point seems to expand from this vanishing point or contract towards it when the 
camera moves backwards. This particular image location is therefore commonly referred to as the focus of 
expansion (FOE) or the focus of contraction (FOC). Each displacement vector passes through the FOE 
creating the typical radial expansion pattern shown in Figure 5. Figure 5 reveals the location of the FOE. 
With pure vehicle translation, points in the environment (A.B) move along 3-D vectors parallel to the vector 
pointing from lens center 126 to the FOE in camera plane 128 (Figure 3). These vectors form parallel lines 
in space which have a common vanishing point (the FOE) in the perspective image. 

As can be seen in Figure 5, the straight line passing through the lens center of the camera and the FOE 
is also parallel to the 3-D displacement vectors. Therefore, the 3-D vector OF points in the direction of 
camera translation in space. Knowing the internal geometry of the camera (i.e., the focal length f). the 
direction of vehicle translation can be determined by locating the FOE in the image. The actual translation 
vector T applied to the camera is a multiple of the vector which supplies only the direction of camera 
translation but not its magnitude. Therefore. 
T = \ OF = X[x, y ( f] T . X«R. (16) 

Since most previous work incorporated a velocity-based model of 3-D motion, the focus of expansion 
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; commonly been interpreted as the direction of instantaneous heading, i.e., the direction of vehicle 
relation during an infinitely short period in time. When images are given as "snapshots" taken at discrete 
:ances of time, the movements of the vehicle must be modeled accordingly as discrete movements from 
> position in space to the next. Therefore, the FOE cannot be interpreted as the momentary direction of 
relation at a certain point in time, but rather as the direction of accumulated vehicle translation over a 
iod of time. 

Figure 6 illustrates the concept of FOE for discrete time steps. The motion of vehicle 132 between two 
nts in time can be decomposed into a translation followed by a rotation. The image effects of pure 
relation (FOEJ are observed in image l Q . Figure 6 shows the top view of vehicle 132 traveling along a 
ved path at two instances in time to and ti. The position of the vehicle 132 in space is given by the 
iition of a reference point on the vehicle P and the orientation of the vehicle is Q. Figure 6 also displays 

adopted scheme of 3-D motion decomposition. First, the translation T is applied which shifts the 
tide's reference point (i.e.. lens center 126 of the camera) from position P 0 to position Pi without 
inging the vehicle's orientation n. The 3-D translation vector T intersects image plane 128 at FOE a . In the 
:ond step the vehicle is rotated by « to the new orientation Qi. Translation T transforms image l 0 into 
age h. which again is transformed into I, by rotation The important fact is that FOE a is observed at 

transition from image lo to image l\, which is obtained by derotating image li by -u. Throughout the 
>sent description, this scheme (Figure 6) is used as a model for vehicle or camera motion. 

The amount of camera translation can be measured. Figure 7 shows the geometric relationships for the 
) case. The amount of expansion from the FOE for discrete time steps is illustrated. Figure 7 can be 
reidered as a top view of the camera, i.e., a projection onto the X/Z-plane of the camera-centered 
ordinate system 130. The cross section of the image plane is shown as a straight line. The camera 
.ves by a vector T in 3-D space, which passes through lens center 126 and the FOE in camera plane 
3. The 3-0 2-axis is also the optical axis of the camera. The camera is translating from left to right in the 
action given by T = (X f f) T . 

A stationary 3-D point is observed at two instances of time, which moves in space relative to the 
nera from X to X , resulting in two images x and x . 

X - [*] and X- - [J] - £ ; S] 

ing the inverse perspective (8) transformation yields 
= J X and 

= Z- AZ = i-X = i(X-AX). (18) 
jm similar triangles (shaded in Figure 7) 

ti - ti , 

d therefore 

z =* az *r - i + x I x f . (20) 

x' - x x' - X 



Thus, the rate of expansion of image points from the FOE contains direct information about the distance 
the corresponding 3-D points from the camera. Consequently, if the vehicle is moving along a straight 
e and the FOE has been located, the 3-D structure of the scene can be determined from the expansion 
ittern in the image. However, the distance 2 of a 3-D point from the camera can only be obtained up to 
a scale factor AZ. which is the distance that the vehicle advanced along the 2-axis during the elapsed 
ne. 

When the velocity of the vehicle (AZ / t) in space is known, the absolute range of any stationary point 
in be computed. Alternatively, the velocity of the vehicle can be obtained if the actual range of a point in 
e scene is known (e.g.. from laser range data), tn practice, of course, any such technique requires that the 
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FOE can be located in a small area, and the observed image points exhibit significant expansion away from 
the FOE. As shown below, imaging noise and camera distortion pose problems in the attempt to assure that 
both of the above requirements are met. 

If a set of stationary 3-D points { (X, X , ) } is observed, then of course the translation in the Z-direction 
ts the same for every point. 
2, - Z". = Z, - Z\ = AZ for all i.j. (21) 

Therefore, the range of every point is proportional to the observed amount of expansion of its image away 
from the FOE 

Z, - oc x 'l I *f , (22) 
x'. - x, 



which renders the relative 3-D structure of the set of points. 

The effects of camera translation T can be formulated as a mapping t of a set of image locations {x,} 
into another set of image locations {x Unlike in the case of pure camera rotation, this mapping not only 
depends upon the 3-D translation vector but also upon the actual 3-D location of each individual point 
observed. Therefore, in general, t is not simply a mapping of the image onto itself. However, one important 
property of t can be described exclusively in image plane 128. namely that each point must map onto a 
straight line passing through the original point and one unique location in the image (the FOE). This means 
that if vehicle 132 is undergoing pure translation, then there must exist an image location x f such that the 
mapping t satisfies the condition radial-mapping t(xi, I, I ): 
t = { (x„ x',j c Ixl | x', = x, + u, (x, - x f ). a, e R, a, £ 0}. (23) 

When the vehicle is not undergoing pure translation or rotation but combined 3-D motion (i.e., 
translation and rotation) of the form R<j> Rq T. the effects in the image are described by a transformation d 
(for displacement) which is a combination of r$. r$ and t: 
d : I - f - r 0 re t (I). (24) 

where I = {xj}, I = {x ,} are the two sets of corresponding image points. Figure 8a and 8b show a typical 
displacement field for a camera undergoing horizontal and vertical rotation as well as translation. The points 
x f e I are marked with small circles 138. Rectangle 134 marks the area of search for the FOE. The derotated 
displacement is illustrated in Figure 8b with the FOE marked by circle 136. 

By decomposing a composite displacement field d into its three components r$, re, and t. the vehicle's 
rotation and direction of translation in space can be computed from the information available in the image. 
This problem is addressed below. As discussed in the previous section, the 3-D motion M of the vehicle is 
modeled by a translation T followed by a rotation R$ about the Y-axis and a rotation H 0 about the X-axis: 
M = R 0 R$ T. (25) 

This results in a mapping d from the original image lo at time to into the new image 1- at time ti . 
d: lo — l« = r 0 rg t lo = r<*> re l o. (26) 

The intermediate image I o in (26) is the result of the translation component of the vehicle's motion and has 
the property of being a radial mapping (23). Unlike the two images lo and h, which are actually given, the 
image I o is generally not observed, except when the camera rotation is zero. It serves as an intermediate 
result to be reached during the separation Gf translations and rotational motion components. 

The question at this point is whether there exists more than one combination of rotation mappings r<j> 
and r$ which would satisfy this requirement, i.e., if the solution is unique. It has been pointed out above that 
the decomposition of 3-D motion into R<j>, Rq, R^, and T is unique for a fixed order of application. This does 
not imply, however, that the effects of 3-D motion upon the perspective image are unique as well. 

Related art has shown that seven points in two perspective views suffice to obtain a unique interpreta- 
tion in terms of rigid body motion and structure, except for a few cases where points are arranged in some 
very special configuration in space. Further art reports computer experiments which suggest that six points 
are sufficient in many cases and seven or eight points yield unique interpretations in most cases. 

Due to its design and the application, however, the motion of a typical autonomous land vehicle (ALV) in 
space is quite restricted. The vehicle can only travel upright on a surface and its large wheelbase allows for 
only relatively small changes in orientation. It is also heavy and thus exhibits considerable inertia. Therefore, 
the final motion parameters must lie within a certain narrow range and it can be expected that a unique 
solution can be found even in cases when the number of points is near or above the minimum. 

The fact that 
l'o = re" 1 r 0 -' I, = t lo (27) 
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suggests two different strategies for separating the motion components: (1) FOE from rotation - succes- 
sively apply combinations of inverse rotation mappings 

to'the'icond imagt\. until the resulting image l' is a radial mapping with respect to the original image l 0 . 
Then locate the FOE x, • in lo and (2) rotation from FOE - successively select FOE-locat.ons (different 

directions of vehicle translation) X„. X, X„ in the original image lo and then determ.ne the inverse 

rotation mapping r 8l - r*r' that yields a radial mapping with respect to the given FOE x„ in the original 
image lo. 

Both alternatives were investigated under the assumption of restricted, but realistic vehicle motion, as 
stated earlier. It turned out that the major problem in the FOE-from-rotation approach is to determ.ne if a 
mapping of image points is (or is close lo being) radial when the location of the FOE is unknown. Of course, 
in the presence of noise, this problem becomes even more difficult. The second approach was examined 
after it appeared that any method which extends the given set of displacement vectors backwards to find 
the FOE is inherently sensitive to image degradations. 

Although there have been a number of suggestions for FOE-algorithms in the past, no results of 
implementations have been demonstrated on real outdoor imagery. One reason for the absence of useful 
results might be that most researchers have tried to locate the FOE in terms of a single, distinct image 
location In practice, however, the noise generated by merely digitizing a perfect translation displacement 
field may keep the resulting vectors from passing through a single pixel. Even for human observers it 
seems to be difficult to determine the exact direction of heading (i.e.. the location of the FOE on the retina^ 
Average deviation of human judgement from the real direction has been reported to be as large as 10 and 
up to 20' in the presence of large rotations. It was, therefore, an important premise in this work that the 
final algorithm should determine an area of potential FOE-locations (called the "fuzzy FOE") instead of a 
single (but probably incorrect) point. . ■ 

The FOE may be obtained from rotation. In this method, the image motion is decomposed in two steps. 
First the rotational components are estimated and their inverses are applied to the image, thus partially 
"derotating" the image. If the rotation estimate was accurate, the resulting displacement field after 
derotation would diverge from a single image location (the FOE). The second step venf.es that the 
displacement field is actually radial and determines the location of the FOE. For this purpose, two problems 
jo have to be solved: (1) how to estimate the rotational motion components without knowing the exact location 
of the FOE. and (2) how to measure the "goodness of derotation" and locate the FOE. 

The rotational components can be estimated. Each vector in the displacement field is the sum of vector 
components caused by camera rotation and camera translation. Since the displacement caused by 
translation depends on the depth of the corresponding points in 3-D space (equation 18). points located at a 
35 large distance from the camera are not significantly affected by camera translation. Therefore, one way of 
estimating vehicle rotation is to compute e and * from displacement vectors which are known to belong to 
points at far distance. Under the assumption that those displacement vectors are only caused by rotation, 
equations 14 and 15 can be applied to find the two angles. In some situations, distant points are selected 
easily. For example, points on the horizon are often located at a sufficient distance from the vehicle. Image 
40 points close to the axis of translation would be preferred because they expand from the FOE slower than 
other points at the same depth. However, points at far distances may not always be available or may not be 
known to exist in the image. In those cases, the following method for estimating the rotational components 
can be used. The design of the ALV (and most other mobile robots) does not allow rapid changes in the 
direction of vehicle heading. Therefore, it can be assumed that the motion of the camera between two 
45 frames is constrained, such that the FOE can change its location only within a certain range. If the FOE was 
located in one frame, the FOE in the subsequent frame must lie in a certain image region around the 
previous FOE location. Figure 9a shows an image plane which illustrates this situation. The FOE of the 
previous frame was located at the center of square 140 which outlines the region of search for the current 
FOE. thus the FOE in the given frame must be inside square 140. Three displacement vectore are shown 
50 P1 » P1 ', P2 - 92. P3 - P3). The translation^ components (PI - Q1. P2 - Q2. P3 - Q3) of those 
displacement vectors and the FOE (inside square 140) are not known at this point in time but are marked in 
Figure 9a. 

The main idea of this technique is to determine the possible range of camera rotations which would be 
consistent with the FOE lying inside marked region 140. Since the camera rotates about two axes, the 
55 resulting range of rotations can be described as a region in a 2-0 space. Figure 9b shows this rotation 
space with the two axes e and * corresponding to the amount of camera rotation around the Y-axis and the 
X-axis, respectively. The initial rotation estimate is a range of HO* in both directions which is indicated by 
a square 142 in rotation space of Figure 9b. 
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In general, the range of possible rotations is described by a closed, convex polygon in rotation space. A 
particular rotation (*?'. © ) is possible if its application to every displacement vector (i.e.. to its endpoint) 
yields a new vector which lies on a straight line passing through the maximal FOE-region. The region of 
possible rotations is successively constrained by applying the following steps for every displacement vector 

s (Figure 10a and tOb). First apply the rotation mapping defined by the vertices of the rotation polygon to the 
endpoint P of the displacement vector P-*P 146. This yields a set of image points P,. Second, connect the 
points Pj to a closed polygon 148 in the image. Polygon 148 is similar to the rotation polygon 144 but 
distorted by the nonlinear rotation mapping as shown in Figure 10a. Third, intersect polygon 146 in the 
image with open triangle 150 formed by the starting point P of the displacement vector 146 and defined by 

70 two tangents 152 and 154 onto the maximal FOE-region. Rotations that would bring the endpoint of the 
displacement vector outside triangle 150 are not feasible. The result is a new (possibly empty) polygon 156 
in the image plane. Fourth, new polygon 156 from the image plane back into the rotation space of Figure 
9b. Fifth, rotation polygon 158 is empty (number of vertices is zero), then stop. No camera rotation is 
possible that would make all displacement vectors intersect the given FOE-region. Repeat the process 

? 5 using a larger FOE-region. 

Figures 11a. 11b and 11c show the changing shape of the rotation polygon during the application of this 
process to the three displacement vectors in Figure 8. 

Since the mapping from rotation space to the image plane is nonlinear (equation 9). the straight lines 
between vertices in the rotation polygon 160 do not correspond to straight lines in the image. They are, 

20 however, approximated as straight lines in order to simplify the intersection with the open triangle. The 
dotted lines in the image plane show the actual mapping of the rotation polygon onto the image. It can be 
seen that the deviations from straight lines are small and can be neglected. Figure 11a shows rotation 
polygon after examining displacement vector P1— P1 . Any camera rotation inside the polygon would move 
the endpoint of the displacement vector (P1) into the open triangle formed by targets 164 and 166 through 

25 P1 to the maximal FOE-region given by square 68 in the image plane. The actual mapping of the rotation 
polygon into the image plane is shown with a dotted outline. 

Figure 11b reveals the rotation polygon 170 after examining displacement vectors PHP1 and P2— P2 . 
Figure 11(c) shows the final rotation polygon after examining the three displacement vectors PI— P1 . 
P2— P2' and P3— P3'. The amount of actual camera rotation (0 = -2.0* . <t> = 5.0* ) is marked with a small 

30 circle (arrow) 172. 

Increasing the number of displacement vectors improves the rotation estimate. In practice, the amount 
of camera rotation can be constrained to a range of below 1 " in both directions. Rotation can be estimated 
more accurately when the displacement vectors are short, i.e., when the amount of camera translation is 
small. This is in contrast to estimating camera translation which is easier with long displacement vectors. 

35 The situation when the rotation polygon becomes empty requires some additional considerations. As 
mentioned earlier, in such a case no camera rotation is possible that would make all displacement vectors 
pass through the given FOE-region-. This could indicate one of .the two alternatives. First, at least one of the 
displacement vectors belongs to a moving object. Second, the given FOE-region does not contain the 
actual location of the FOE. i.e.. the region is not feasible. The latter case is of particular importance. If a 

40 region can be determined not to contain the FOE. then the FOE must necessarily lie outside this region. 
Therefore, the above method can not only be used to estimate the amount of camera rotation, but also to 
search' for the location of the FOE. Unfortunately, if the rotation polygon does not become empty, this does 
not imply that the FOE is actually inside the given region. It only means that all displacement vectors would 
pass through this region, not that they have a common intersection inside this region. However, if not all 

45 vectors pass through a certain region, then this region cannot possibly contain the FOE. The following 
recursive algorithm searches a given region for the FOE by splitting it into smaller pieces (divide-and- 
conquer): 
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ASIBLE (region, min-size, disp- vectors): 
E (region) < min-size then return (region) 



FEASIBLE (region, disp-vectors) then 
return (union) 

MIN- FEASIBLE (sub-region- 1, min-size, 

disp-vectors), 
MIN- FEASIBLE (sub-region-2, min-size, 
disp-vectors), 



MIN- FEASIBLE (sub-region-n, min-size, 
disp-vectors))) 
eturn (nil) {region does not contain the FOE) 

smallest feasible FOE-region by systematically discarding sub regions from 
:ase that the shape of the original region is a square, subregions can be 
into four subsquares of equal size. The simple version shown here performs 
j smallest subregion (limited by the parameter "min-size"). which is neither 
fficient approach. The algorithm can be significantly improved by applying a 

example, by trying to discard subregions around the perimeter first before 
sn. Two major problems were encountered with the latter method. First, the 
tensive since the process of computing feasible rotations must be repeated 
i small region is more likely to be discarded than a larger one. However, 
comes too small, errors induced by noise, distortion, or point-tracking may 
Dm passing though a region which actually contains the FOE. 
not employed in the further treatment, it suggests an interesting alternative 
n traditional FOE-algorithms. Its main attractiveness is that it is inherently 

most other techniques .which search for a single FOE-location. For the 
unt of rotation, the method using points at far distance mentioned earlier is 
other alternatives for locating the FOE once the rotation components have 
n the following. 

ally derotated image may be attempted. After applying a particular derotation 
field, the question is how close the new displacement field is to a radial 
erge from one image location. If the displacement field is really radial, then 
ted and only the components due to camera translation remain. Two different 
jperty are noted. One method uses the variance of intersection at imaginary 
e second method computes the linear correlation coefficient to measure how 
j is. The variance of intersection in related art suggests to estimate the 
it field by computing the variance of intersections of one displacement vector 
ntersections lie in a small neighborhood, then the variance is small, which 
t field is almost radial. The problem can be simplified by using an imaginary 
3ad, whose orientation is not affected by different camera rotations. Figure 12 
Pl _p' _ Ps ' intersecting a vertical line at x at y, ... ys- Moving the vertical 
ng the points of intersection closer together and will thus result in a smaller 
oint of intersection of a displacement vector P.-P,' with a vertical line at x is 
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x l y i ' y \ x \ (28) 



The variance of intersection of all displacement vectors with the vertical line at position x is 

•(29) 



N 



V v 2 - 1 



r 



!, xj + x'j N II, xj * x'j 



J J 



To find the vertical cross section with minimum intersection variance, the first derivative of (29) with respect 
to x is set to zero. The location xq of minimum intersection variance is then obtained. Similarly, the position 
of a horizontal cross section with minimal intersection variance can be obtained. 

The square root of the variance of intersection (standard deviation) at a vertical line was evaluated on 
the synthetic displacement field shown in Figure 13. The actual FOE is located in the center of the image. 
Square 174 around the center (±100 pixels in both directions) marks the region over which the error 
functions are evaluated. 

Figures 14a. b. c and d show the distribution of the intersection standard deviation for increasing 
residual rotations in vertical direction in the absence of noise. The horizontal rotation is 0* in all cases 
represented by these Figures. Locations of displacement vectors are represented by real numbers (not 
rounded to integer values}. In Figure 14a, no residual rotation exists, i.e.. the displacement field is perfectly 
racial. The value of the horizontal position of the cross section varies ±100 pixels around the actual FOE. 
The standard deviation is zero for x = x, (the x-coordinate of the FOE) and increases linearly on both sides 
of the FOE. In Figures I4b-d, the residual vertical rotation is increased from 0.2* to 1.0* . The bold vertical 
bar marks the horizontal position of minimum standard deviation, the thin bar marks the location of the FOE 
(X f ). It can be seen that the amount of minimum standard deviation rises with increasing disturbance by 
rotation, but that the location of minimum standard deviation does not necessarily move away from the 
FOE. 

Figures 15-17 show the same function under the influence of noise. In Figure 15a-d, noise was applied 
except that by merely rounding the locations of displacement vectors to their nearest integer values. These 
Figures show standard deviation of intersection (square root) at a vertical cross section at position x for 
different amounts of vertical rotation with no horizontal rotation. Uniform noise of ±1 pixels was added to the 
image locations in Figures 16a-d. In Figures 17a-d. uniform noise of ±2 pixels was applied to the image 
locations. It can be seen that the effects of noise are similar to the effects caused by residual rotation 
components. The purpose of this error function is to determine where the FOE is located, and how "radial" 
the current displacement field is. 

If the displacement field is already perfectly derotated, then the location of minimum intersection 
standard deviation is the location of the FOE. Ideally, all vectors pass through the FOE, such that a cross 
section through the FOE yields zero standard deviation. The question is how well the FOE can be located in 
an image which is not perfectly derotated. Figure 18a-d plot the location of minimum intersection standard 
deviation under varying horizontal rotation. The vertical rotation is kept fixed for each plot. Horizontal 
camera rotations from -1 * to + 1 " are shown on the abscissa (rot). The ordinate (xO) gives the location of 
minimum standard deviation in the range of ±100 pixels around the FOE (marked xf). The location of 
minimum standard deviation depends strongly on the amount of horizontal rotation. 

A problem is that the location of minimum standard deviation is not necessarily closer to the FOE when 
the amount of rotation is less. The function is only well behaved in a narrow range around zero rotation, 
which means that the estimate of the camera rotation must be very accurate to successfully locate the FOE. 
The second purpose of this error function is to measure how "radial" the displacement field is after partial 
denotation. This should be possible by computing the amount of minimum intersection standard deviation. 
Intuitively, a smaller amount of minimum intersection standard deviation should indicate that the displace- 
ment field is less disturbed by rotation. Figures 19a-d and 20a-d show that this is generally true by showing 
the amount of minimum intersection standard deviation under varying horizontal rotation. For the noise-free 
case in Figure 19a, the amount of minimum intersection standard deviation becomes zero in the absence of 
horizontal and vertical rotations, indicating that the derotation is perfect. Unfortunately, the function is not 
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well behaved even in this relatively small range of rotations {±1.0 ). The curve exh.b.ts some sharp local 
minima where an algorithm searching for an optimal derotation would get trapped easily. Figures 19a-d 
show the same function in the presence of r2 pixels of uniform noise added to image locations. 

The second method, utilizing linear correlation, of measuring how close a displacement field is to a 
radial pattern again uses the points y .-y.s and y 2 ,-y 25 of intersection at vertical (or horizontal) lines x, and 
x- as illustrated in Figure 21. The displacement vectors P,-P . through P s -Ps are intersected by two 
vertical lines X, and X 2 . both of which lie on the same side of the FOE that is within area 176. Since the 
location of the FOE is not known, the two lines X, and X 2 are simply located at a sufficient distance from 
any possible FOE-location. This results in two sets of intersection points { ( x,, y„ ) } and { ( x 2 . y* ) }. If 
all displacement vectors emanate from one single image location, then the distances between correspond- 
ing intersection points in the two sets must be proportional, i.e., 

yil - yij , *11 ~ y lk for .11 IJ.k. (30) 

Therefore, a linear relationship exists between the vertical coordinates of intersection points on these two 
lines The "goodness" of this linear relationship is easily measured by computing the correlation coefficient 
for the y-coordinates of the two sets of points. The resulting coefficient is a real number in the range from 
-10 to +1.0. If both vertical lines are on the same side of the FOE. then the optimal value is +1.0. 
Otherwise, if the FOE lies between the two lines, the optimal coefficient is -1.0. The horizontal position of 
the two vertical lines is of no importance, as long as one of these conditions is satisfied. For example, the 
left and right border lines of the image can be used. 

Figures 22a-d and 23a-d show plots for the correlation coefficient for intersection of displacement 
vectors at two vertical lines under varying horizontal rotations, under the same conditions as in Figures 19a- 
d and 20a-d. No noise was applied for Figure 22a-d. In Figures 23a-d. a uniform noise of s2 pixels was 
added to the image locations. The optimal coefficient is + 1.0 (horizontal axis) in Figures 22a-d and 23a-d. 
The shapes of the curves of Figures 22a-d and 23a-d are similar, respectively, to Figures 19a-d and 20a-d 
for the minimum standard deviations shown above earlier, with peaks at the same locations. It is apparent, 
however, that each curve has several locations where the coefficient is close to the optimum value ( + 1.0). 
i.e., no distinct global optimum exists which is not only the case in the presence of noise (Figures 23a-d). 
This fact makes the method of maximizing the correlation coefficient useless for computing the FOE. 

The main problem encountered in computing the FOE from rotation, just described above, is that none 
of the functions examined was well behaved, making the search for an optimal derotation and the location of 
the FOE difficult. Disturbances induced by noise and residual rotation components are amplified by 
extending short displacement to straight lines and computing their intersections. The method, rotation from 
FOE. described below avoids this problem by guessing an FOE-location first and estimating the optimal 
derotation for this particular FOE in the second step. 

Given the two images lo and I. of corresponding points, the main algorithmic steps of this approach are: 
(1) Guess an FOE-location x,"> in image lo (for the current iteration i); (2) Determine the derotation mapping 
rg-' r*"' which would transform image I, into an image l', such that the mapping ( x,®. lo. I . ) deviates 
from a radial mapping (equation 23) with minimum error E«>; and (3) Repeat steps (1) and (2) until an FOE- 
location x, iw with the lowest minimum error E"° is found. 

An initial guess for the FOE-location is obtained from knowledge about the orientation of the camera 
with respect to the vehicle. For subsequent pairs of frames, the FOE-location computed from the previous 
pair can be used as a starting point. . _, 

Once a particular x, has been selected, the problem is to compute the rotation mappings re - and r* 
which when applied to the image li . will result in an optimal radial mapping with respect to lo and x t . 

To measure how close a given mapping is to a radial mapping, the perpendicular distances between 
points in the second image (x 0 and the "ideal" displacement vectors is measured. The "ideal" displace- 
ment vectors lie on straight lines passing through the the FOE x, and the points in the first image x, (see 
Figure 24) which illustrates measuring the perpendicular distance d, between lines from x, through points x, 
in the second image. The sum of the squared perpendicular distances d,. is the final error measure. For 
each set of corresponding image points (x, e I. x'i e I ). the error measure is defined as 



15 



EP 0 390 051 A2 



E (x f ) - z E i - I * | - E 

i I i 



1 2 



x^X| X x^x i 



(31) 



In the following, it is assumed that the amount of residual image rotation in horizontal and vertical 
direction is moderately small (less than 4" ). In most practical cases, this condition is satisfied, provided that 
the time interval between frames is sufficiently small. However, should the amount of vehicle rotation be 
very large for some reason, a coarse estimate of the actual rotation can be found {as described above* and 
applied to the image before the FOE computation. With small amounts of rotation, the actual rotation 
mapping, where points move on horizontal and vertical hyperbolic paths, can be approximated by a 
horizontal and vertical shift with constant length over the entire image. Under this condition, the inverse 
rotation mapping r 0 ~\ re' 1 can be approximated by a adding a constant vector s = ( s x s y ) which is 
independent of the image location: 
f- = r^-'r 0 - I- = s + li. (32) 

Given two images I and I the error measure (equation 31) becomes 
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where x, « I and x - t « I. For a given FOE-location x,, the problem is to minimize E with respect to the two 
unknowns s* and s y . To reduce this problem to a one-dimensional search, one point x g , called the "guiding 
point, is selected in image I which is forced to maintain zero error (see Figure 25) wherein one vector x g is 
selected from the set of displacement vectors to determine the optimum 2-D shift to be applied to points x'j, 
given a FOE-location x,. 

Rrst x g is forced onto the line x,x g and then the entire image I = {x ?, x'2....} is translated in the direction 
of this line until the error value reaches a minimum. Therefore, the corresponding point x g must lie on a 
straight line passing through x f and x g . Any shift s applied to the image \ must keep x' g on this straight line, 
so 

x g + s = xi + X (x g - x ( ) for all s, (34a) 
for all s, and thus, 

s = x, - x g ' + X (x g - x,)(X*R). (34b) 

For x = 1 . s = x g - x g , which is the vector x' g — x g . This means that the image I is shifted such that x g and 
x g overlap. This leaves X as the only free variable and the error function (equation 33) is obtained as 



E (A) - £ £ A, ♦ B, - qj 2 



(35) 
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Differentiating equation 35 with respect to X and forcing the resulting equation to zero yields the parameter 
for the optimal shift s opI as 
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x = Z A,C, - I A,B (36) 

A opt : 

E A i 

The optimal shift s opl and the resulting minimum error E(X^) for the given FOE-location x, is obtained 
by inserting X 001 into equations (34b) and (35) respectively, giving 
E m ,„(x.) = X- opl EA ! i + 2X opt [E Aft - E A.CJ -2 E B.C, + E B», + E C ! , (37) 
The normalized error E„ shown in the following results (shown in figures 27-32) is defined as 
E„(x,) = E m ,„(x,) (38) • 
where N is the number of displacement vectors used for computing the FOE. 

Since in a displacement field caused by pure camera translation all vectors must point away from the 
FOE this restriction must hold for any candidate FOE-location (as illustrated in Figure 25). If after applying 
s , ( x, ) to the second image l'. the resulting displacement field contains vectors po.nt.ng towards the 
hypothesized x„ then this FOE-location is prohibited and can be discarded from further consideration such 
is the case at point x, in Figure 26. figure 26 shows a field of 5 displacement vectors. The optimal shift Sop, 
for the given x, is shown as a vector in the lower right-hand corner. When s opl is applied to po.nt x , , the 
resulting displacement vector (shown fat) does not point away from the FOE. Since its projection onto the 
line x,x, points towards the FOE. it is certainly not consistent with a radial expansion pattern. 

The final algorithm for determining the direction of heading as well as horizontal and vert.cal camera 
rotations is the "f.nd-FOE algorithm" which consists the steps: (1) Guess an initial FOE x,°. for example the 
FOE-location obtained from the previous pair of frames; (2) Starting from x,°, search for a location x, 
where E min ( x,°" ) is a minimum. A technique of steepest descent is used, where the search proceeds in 
the direction of least error; and (3) Determine a region around x^ 1 in which the error is below some 
threshold. The search for this FOE-area is conducted at FOE-locations lying on a grid of fixed width. In the 
examples shown, the grid spacing is 10 pixels on both x- and y-directions. 

The error function E(x,) is computed in time proportional to the number of displacement vectors N. The 
final size of the FOE-area depends on the local shape of the error function and can be constrained not to 
exceed a certain maximum M. Therefore, the time complexity is O(MN). 

The first set of experiments was conducted on synthetic imagery to investigate the behavior of the error 
measure under various conditions, namely, the average length of the displacement vectors (longer 
displacement vectors lead to a more accurate estimate of the FOE), the amount of residual rotation 
components in the image, and the amount of noise applied to the location of image points, figures 27a-d 
shows the distribution of the normalized error E„ (x.) for a sparse and relatively short displacement field 
containing 7 vectors. Residual rotation components of *2* in horizontal and vertical direction are present in 
Figures 27b-d to visualize their effects upon the image. This displacement field was used with different 
average vector lengths (indicated as length-factor) for the other experiments on synthetic data. The 
displacement vector through the guiding point is marked with a heavy line. The choice of this point is not 
critical, but it should be located at a considerable distance from the FOE to reduce the effects of no.se upon 
the direction of the vector xtf_ Figures 27a-d. which show displacement field and minimum error at 
selected FOE-locations. the error function is sampled in a grid with a width of 10 pixels over an area of 200 
by 200 pixels around the actual FOE. which is marked by small square 178. At each grid point, the amount 
of normalized error is (equation 41) indicated by the size of the circle 180. Heavy circles 180 indicate error 
values which are above a certain threshold. Those FOE-locations that would result in displacement vectors 
which point towards the FOE (as described above) are marked as prohibited ( + ). It can be seen that the 
shape of the 2-D error function changes smoothly with different residual rotations over a wide area and 
exhibits its minimum close to the actual location of the FOE Figure 27a represents no residual rotation 
Figure 27b represents 2.0* of horizontal camera rotation (to the left), figure 27c represents 2.0 of vertical 
rotation (upwards), and figure 27d represents -2.0' vertical rotation (downwards). 

Figures 28 to 33 show the effects of various conditions upon the behavior of this error function in the 
same 200 by 200 pixel square around the actual FOE as in Figures 27a-d. 

figures 28a-e show how the shape of the error function depends upon the average length (with length 
factors varying from 1 to 15) of the displacement vectors in the absence of any residual rotation or no.se 
(except digitization noise). The minimum of the error function becomes more distinct with increasing 
amounts of displacement. Figures 29a-e show the effect of increasing residual rotation in honzonta 
direction upon the shape of the error function for relatively short vectors (length factor of 2.0) in absence of 
noise. 
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Figures 30a-e show the effect of residual rotation in vertical direction upon the shape of the error 
function for short, vectors (length factor of 2.0) in absence of noise. Here, it is important to notice that the 
displacement Held used is extremely nonsymmetric along the Y-axis of the image plane. This is motivated 
by the fact that in real ALV images, long displacement vectors are most likely to be found from points on 
s the ground, which are located in the lower portion of the image. Therefore, positive and negative vertical 
rotations have been applied in Figures 30a-e. 

In Figures 31a-j, residual rotations in both horizontal and vertical direction, respectively, are present, for 
short vectors with a length factor of 2.0. In Figures 31a-e. the error function is quite robust against rotational 
components in the image. Figures 3if-j show the amounts of optimal linear shift s op , under the same 
70 conditions. The result in Figure 31 e shows the effect of large combined rotation of 4.0 ' 4.0 " in both 
directions. Here, the minimum of the error function is considerably off the actual location of the FOE 
because of the error induced by using a linear shift to approximate the nonlinear derotation mapping. In 
such a case, it would be necessary to actually derotate the displacement field by the amount of rotation 
equivalent to s op , found at the minimum of this error function and repeat the process with the derotated 
?5 displacement. 

The effects of various amounts of uniform noise applied to image point coordinates for a constant 
average vector length of 5.0. are shown in Figures 32a-e. For this purpose, a random amount (with uniform 
distribution) of displacement was added to the original (continuous) image location and then rounded to 
integer pixel coordinates. Random displacement was applied in ranges from £0.5 to ±4.0 pixels in both 

20 horizontal and vertical direction. The shape of the error function becomes flat around the local minimum of 
the FOE with increasing levels of noise. The displacement field contains only 7 vectors. What is observed 
here is that the absolute minimum error increases with the amount of noise. Figures 32a-e can thus serve 
as an indicator for the amount of noise present in the image and the reliability of the final result. 

The length of the displacement vectors is an important factor. The shorter the displacement vectors are, 

25 the more difficult it is to locate the FOE correctly in the presence of noise. Figures 33a and 33b show the 
error functions for two displacement fields with different average vector lengths (length factor 2.0 and 5.0, 
respectively). For the shorter displacement field (length-factor 2.0) in Figure 33. the shape of the error 
function changes dramatically under the same amount of noise (compare Figure 31a). A search for the 
minimum error (i.e.. local minimum) inevitably converge towards an area 182 indicated by the small arrow, 

30 far off the actual FOE. For the image with length-factor 5.0 (Figure 33b). the minimum of the error function 
coincides with the actual location of the FOE 178. The different result for the same constellation of points in 
the Figure 32d is caused by the different random numbers (noise) obtained in each experiment. This 
experiment confirms that a sufficient amount of displacement between consecutive frames is essential for 
reliably determining the FOE and thus, the direction of vehicle translation. 

35 The performance of this FOE algorithm is shown below on ajsequence of real images taken from a 
moving ALV. Also, it is shown how the absolute velocity of the vehicle can be estimated after the location of 
the FOE has been determined. The essential measure used for this calculation is the absolute height of the 
camera above the ground which is constant and known. Given the absolute velocity of the vehicle, the 
absolute distance from the camera of 3-0 points in the scene can be estimated using equation 20. 

«o Next, velocity over ground may be computed. After the FOE has been computed following the steps 
outlined above, the direction of vehicle translation and the amount of rotation are known. From the derotated 
displacement field and the location of the FOE, the 3-D layout of the scene can be obtained up to a 
common scale factor (equation 20). As pointed out above, this scale factor and, consequently, the velocity 
of the vehicle can be determined if the 3-D position of one point in space is known. Furthermore, it is easy 

^5 to show that it is sufficient to know only one coordinate value of a point in space to reconstruct its position 
in space from its location in the image. 

Since the ALV travels on a fairly flat surface, the road can be approximated , as a plane which lies 
parallel to the vehicle's direction of translation (see Figure 34). This approximation holds at least for a good 
part of the road in the field of view of the camera 1 84. Figure 34 shows a side view of camera 1 84 traveling 

so parallel to flat surface 186. Camera 184 advances in direction Z. such that a 3-D point on ground surface 
186 moves relative to camera 184 from Zo to Zi. Depression angle o can be determined from the location 
of FOE 188 in image 128. Height of camera 184 above ground surface 186 is given. 

Since the absolute height of camera 184 above the ground 186 is constant and known, it is possible to 
estimate the positions of points 192 and 194 on road surface 186 with respect to the vehicle 132 (of Figure 

55 6) in absolute terms. From the changing distances between points 192 and 194 and camera 184, the actual 
advancement and speed can be determined. 

First, a coordinate system 130 is introduced which has its origin 0 in the lens center 126 of the camera 
184. The Z-axis of coordinate system 130 passes through FOE 188 in the image plane 128 and aims in the 
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jirection of translation. The original camera-centered coordinate system (X Y Z) 130 is transformed into the 
lew frame (X Y 2) merely by applying horizontal and vertical rotation until the Z-axis lines-up with FOE 
188. The horizontal and vertical orientation in terms of "pan" and "tilt" are obtained by "rotating" FOE 188 ( 
<, y, ) into the center of image 128 ( 0 0 ) using equations 1 4 and 15 in the following: 



4f - -Ua~ 



(39) 
(40) 



(f 2 ♦ x 2 f ) f 2 - x 2 f y 2 f _ 



The two angles fl, and represent the orientation of camera 184 in 3-D with respect to the new 
coordinate system (X Y Z). This allows determination of the 3-D orientation of the projecting rays passing 
through image points y 0 and y* by use of the inverse perspective transformation. A 3-D point X in the 
environment whose image x = ( x y ) is given, lies on a straight line in space defined by 



X - 



cos* 
0 

sintff 



f 



COS^f 

-cosdf sin^f 



COSlf COS^f 



(41) 



For points 192 and 194 on road surface 186 of Figure 34, the Y-coordinate is -h which is the height of 
camera 184 above ground 186. Therefore, the value of * 5 for a point on the road surface (x s y s ) can be 



estimated as 



(42) 



and its 3-D distance is found by inserting f s into equation 41 as 

z , , h X s **°*f - y, cos^f *M f - f cos* f cos^ f ^ " (43) 

If a point on the ground is observed at two instances of time, x s at time t and x s at t . the resulting 
distances frorn the vehicle Z* at t and Z s at t' yield the amount of advancement AZ S (t. t ) and estimated 
velocity V s {t. t') in this period as 
AZ S (U ) = Z s - Z a (44) 

v, (t, o - z * : z '« . < 45 > 

r - t 

Image noise and tracking errors have a large impact upon the quality of the final velocity estimate. 
Therefore, the longest available displacement vectors are generally selected for this measurement, i.e., 
those vectors which are relatively close to the vehicle. Also, in violation of the initial assumption, the ground 
surface is never perfectly flat. In order to partially compensate these errors and to make the velocity 
estimate more reliable, the results of the measurements on individual vectors are combined. The length of 
each displacement vector |x, - x'i| in the image is used as the weight for its contribution to- the final result. 
Given a set of suitable displacement vectors S = - x';}. the estimate of the distance traveled by the 
vehicle is taken as the weighed average of the measurements AZ; on individual vectors 
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SZ (ti 0 . I Ox, - x',| AZ|) 
I Ix, - x*,| 



(46) 



and the final estimate for the vehicle velocity is 



(47) 



t' - t 



This computation was applied to a sequence of real images which is described below. 

in the following, results of the FOE-algortthm and computation of the vehicle's velocity over ground are 
shown on a real image sequence taken from the moving ALV. The original sequence wae provided on 
standard video tape with a frame-rate of 30 per second. Out of this original sequence, images were taken in 
0.5 second intervals, i.e.. at a frame rate of 2 per second in order to reduce trie amount of storage and 
computation. The images were digitized to a spatial resolution of 512 x 512, using only the Y-component 
(luminance) of the original color signal. 

Figures 35a-i show the edge images of frames 182-190 of an actual test, with points 1-57 being tracked 
and labeled with ascending numbers. Figures 35a-i show the original image sequence taken from the 
moving ALV after edge detection and point detection. Selected points 1-57 are located at the lower-left 
corners of their marks. Figures 35j-p (frames 191-197), which include additional points 58-78, show the 
original image sequence taken after edge detection and point selection. An adaptive windowing technique 
was developed as an extension of relaxation labeling disparity analysis for the selection and matching of 
tracked points. The actual image location of each point is the lower left corner of the corresponding mark. 
The resulting data structure consists of a. list of point observations for each image (time), e.g., 
time to: ((p- tc x- y) (p 2 to x 2 yn) (Ps to xj y3) ... ) 
time ti: ((pt U xi yi) (p 2 t< x 2 yz) (p 3 ti x 3 y3) ... ) 

Points are given a unique label when they are encountered for the first time. After the tracking of a point 
has started, its label remains unchanged until this point is no longer tracked. When no correspondence is 
found in the subsequent frame for a point being tracked, either because of occlusion, or the feature left the 
field of view, or because it could not be identified, tracking of this point is discontinued. Should the same 
point reappear again, it is treated as a new item and given a new label. Approximately 25 points per image 
have been selected in the sequence shown in Figures 35a-i. In the search for the focus of expansion, the 
optimal FOE-location from the previous pair of frames is taken as the initial guess. For the very first pair of 
frames (when no previous result is available), the location of the FOE is guessed from the known camera 
setup relative to the vehicle. The points which are tracked on the two cars (24 and 33) are assumed to be 
known as moving and are not used as reference points to compute the FOE. vehicle rotation, and velocity. 
This information is eventually supplied by the reasoning processes in conjunction with the qualitative scene 
model {Figure 2). 

Figures 36a-p illustrate the displacement vectors and estimates of vehicle motion for the image 
sequence shown in Figures 35a-p. Shaded area 190 marks the possible FOE locations and circle 178 inside 
of area 190 is the FOE with the lowest error value. The FOE-ratio measures the flatness of the error function 
inside area 190. 

Also, Figures 36a-p show the results of computing the vehicle's motion for the same sequence as in the 
previous Figure. Each frame t displays the motion estimates for the period between t and the previous 
frame t - 1. Therefore, no estimate is available at the first frame (182). Starting from the given initial guess, 
the FOE-algorithm first searches for the image location, which is not prohibited and where the error function 
(equation 35) has a minimum. 

The optimal horizontal and vertical shift resulting at this FOE-location is used to estimate the vehicle's 
rotations around the X- and Y-axis. This point, which is the initial guess for the subsequent frame, is marked 
as a small circle inside the shaded area. The equivalent rotation components are shown graphically on a 
n * scale. They are relatively small throughout the sequence such that it was never necessary to apply 
intermediate derotation and iteration of the FOE-search. Along with the original displacement vectors (solid 
lines), the vectors obtained after derotation are shown with dashed lines. 

After the location with minimum error has been found, it is used as the seed for growing a region of 
potential FOE-locations. The growth of the region is limited by two restrictions: (1) The ratio of maximum to 
minimum error inside the region is limited, i.e.. 
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(see equation 40 for the definition of the error function E n ). No FOE-Iocation for which the error ratio p' 
exceeds the limit P v,m is joined to the region. Thus the final size of the region depends on the shape of the 
error function. In this example, the ratio p ,im was set at 4.0. Similarly, no prohibited locations (Figure 26) are 

s considered; (2) The maximum size of the region M is given by the given FOE-region region regardless of 
their error values. The resulting error ratio p max = max(p') for the points inside the region indicates the 
shape of the error function for this area. A low value for the ratio p max indicates a flat error function. The 
value for P max is shown as FOE-RATIO in every image. 

For the computation of absolute vehicle velocity, only a few prominent displacement vectors were 

io selected in each frame pair. The criteria were that the vectors be located below the FOE and that their 
length be more than 20 pixels. The endpoints of the selected (derotated) vectors are marked with dark dots. 
The parameter used for the computation of absolute advancement is the height of the camera above the 
ground, which is 3.3 meters (11 feet). 

Figure 37 illustrates hardware embodiment 200 of the present invention. Camera 202 may be a Hitachi 

is television camera having a 48* vertical field of view and a 50' horizontal field of view. Camera 202 may be 
set at a depression angle of 16.3* below the horion. Image sequences 204 acquired by camera 202 are 
transmitted to tracking means 206 comprising an PS processor 208 and a VAX 11/750 computer 210. 
Tracking means 206 tracks tokens between frames of image sequences 204. The output of means 206 
goes to means 212 for matching of tokens and corresponding images. For the two different computers 210 

20 and 218, a VAX-Symbolics bidirectional network protocol means 214 is connected between means 212 and 
means 216 which includes Symbolics 3670 computer 218. though it is possible to use one computer 
thereby eliminating means 214. Computer 218 provides processing for obtaining fuz2y focus of expansion 
and rotations/translation estimates of motion. The language environment used with computer 218 is 
Common LISP. 

25 The methods and processes in the embodiment of the present invention are implemented with the 
. ensuing programming. 



Claims 

30 

1. A method for determining self-motion of an imaging system in an environment, characterized by: 
computing a focus of expansion (FOE) location from two-dimensional displacement vectors of successive, 
two-dimensional images; 

computing a fuzzy FOE region that is a qualitative indication of the FOE location, wherein the fuzzy FOE 
35 region is an area of possible FOE locations for each image; - 

determining an approximate direction of heading and amount of rotation of said imaging system, relative to 
its own reference coordinate system, from the fuzzy FOE; and 

removing the effects of rotation of said imaging system from the displacement vectors. 

2. Method according to claim 1 further characterized by: 

40 acquiring the successive two-dimensional images of the environment of said imaging system; 

selecting features in the images; and determining the two-dimensional displacement vectors for the features 
of each pair of the successive images. 

3. An imaging apparatus capable of determining its own self-motion from its imaging, characterized 

by: 

45 imaging means (202,204) for detecting a sequence of images in a field of view; 

token tracking means (206). connected to said imaging means, for determining two-dimensional displace- 
ment vectors of selected tokens for each consecutive pair of images, and wherein said token tracking 
means tracks stationary parts of a visual environment in the images, using two-dimensional, references such 
as corner points, contour segments, region boundaries as references, from image to image, wherein the 

so two-dimensional displacement vectors are the result of camera motion; 

seeker means (208.210), connected to said token tracking means, for selecting candidate locations for focus 
of expansion under forward translation of said imaging means and for focus of contraction under backward 
translation of said imaging device, for forming a connected image region of the candidate locations and a 
range of corresponding rotations, and for outputting focus of expansion or contraction and image rotation 

55 estimates; and 

optimal rotation means (218). connected to said token tracking means and to said seeker means, for 
determining optimal three-dimensional rotation angles plus an error value for a selected candidate location 
for focus of expansion or contraction. 



21 



EP 0 390 051 A2 



4. Apparatus according to claim 3. characterized in that 

said seeker means (208.210) receives signals conveying the two-dimensional displacement vectors from 
said token tracking means: and 

said optimal rotation means (218) receives signals conveying the two-dimensional displacement vectors 
5 from said token tracking means and signals conveying a single focus of expansion or contraction location 
from said seeker means, and sends the determined optimal three-dimensional rotation angles (pan and tilt) 
plus the error value for the single focus of expansion or contraction location, to said seeker means. 

5. Apparatus according to claim 4. characterized in that said optimal rotation means (21 8) determines 
the optimal three-dimensional rotation angles {pan and tilt) for the single focus of expansion (direction of 

to heading) or contraction location, by simulating the effects of reverse imaging means rotations upon a given 
set of displacement vectors from said token tracking means, and thus virtually rotating said sensing device 
until a modified displacement field approaches closest to a radial expansion pattern relative to the selected 
candidate location for focus of expansion or contraction, the optimal three-dimensional (pan and tilt) rotation 
angles indicating needed reverse rotation of said imaging means and the error value, wherein said error 

75 value is a deviation from radial displacement field. 

6. Apparatus according to claim 5. characterized in that said optimal rotation means determines 
velocity of said imaging device. 

7. An imaging system capable of determining its self-motion from its imaging, characterized by: 
denotation means (122) for derotating two-dimensional displacement vectors from consecutive pairs of two- 

20 dimensional images to remove rotational effects of said imaging system; 

first computing means (124), connected to said derotation means, for computing a fuzzy focus of expansion 
{FOE) from the two-dimensional displacement vectors, wherein the fuzzy FOE is a two-dimensional region 
of possible focus-of-expansion locations on a two-dimensional image; and 

second computing means (118). connected to said derotation means, for computing self-motion parameters 
25 of said imaging system, from the fuzzy FOE. 

8. System according to claim 7 further characterized by: 
sensing means (202.112) acquiring the two-dimensional images; 

feature detection means (114), connected to said sensing means, for detecting, extracting and tracking 
features in the two dimensional images; and 
30 third computing means (118), connected to said feature 'detection means, for computing the two dimen- 
sional displacement vectors from the features. 
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© Method and apparatus for computing the self-motion of moving Imaging devices. 
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© Determining the self-motion In space of an imag- 
ing device (e.g.. a television camera) by analyzing 
image sequences obtained through the device. 
Three-dimensional self-motion is expressed as a 
combination of rotations (M.<J) about the horizontal 
and vertical camera axes and the direction of cam- 
era translation. The invention computes the rotational 
and translational components of the cameras self- 



motion exclusively from visual information. Robust 
performance is achieved by determining the direc- 
tion of heading (i.e.. the focus of expansion) as a 
connected region instead of a single location on the 
image plane. The method can be used to determine 
when the direction of heading is outside the current 
field of view and when there is zero or very small 
camera translation. 
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